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AN ELEMENTARY EXPOSITION OP THE THEOREM OF 
BERNOULLI WITH APPLICATIONS TO STATISTICS 

By Professor H. L. RIETZ, University of Iowa 

In connection with the wide use of certain statistical methods 
within the past few years, there is coming to be some recognition 
of the importance of establishing a measure of the degrees of 
confidence that can properly be placed in inferences from statis- 
tical results such as mean values, standard deviations and 
coefficients of correlation. This recognition is shown in the in- 
creased application of probable errors, even if the applications 
are in many cases made without knowledge of the derivations or 
limitations of the formulas employed. The need for such cri- 
teria in passing judgment on the significance of the simplest of 
statistical results may perhaps be made clear by an appropriate 
illustration. 

In the third edition of Anierican Men of Science by Cattell 
and Brimhall, it is stated that a group of "scientific men re- 
ported 716 sons and 668 daughters." The valid inference is 
drawn that the "difference falls within the limits of chance 
variation, and is not likely to be significant." On the same 
page, 804, we find that a group of scientific men report 1705 
brothers and 1527 sisters. These data suggest the following 
questions of simple statistical sampling: What is the prob- 
ability in throwing 1384 coins that? the number of heads will 
differ from -uyu = 692 by as much or more than 692 — 668 = 
24? What is the probability in throwing 323"2 coins that the 
number of heads will differ from 8 -\ i5 = 1616 by as much or 
more than 1705 — 1616 = 89 ? The answers to these questions 
are obtained by an application of what is known in certain im- 
portant mathematical literature* as the theorem of Bernoulli, 
although the theorem in the form in which we shall use it con- 
tains much in addition * * to the Bernoulli theorem as it ap- 
peared in the latter 's works. 



* See German and French Encyclopedias of Mathematics — Papers on 
Probability. 

** See Laplace, Theorie Analytique des Probabilities. Introduction. 
p. XL.VII, Bertrand. Calcul des Probabilities, Chapter V. 
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428 THE MATHEMATICS TEACHER 

Theorem of Bernoulli 
We shall assume that during a set of s trials, the probability 
of the happening of an event is a constant p from trial to trial, 
and that 

p + q=*l. 

Then the probabilities that the event will happen exactly s, s — 
1, s — 2, • • •, 1, times in s trials are given by the succes- 
sive terms of the binomial expansion 

(P + 3)* = P* + sp" 1 <7+ + m /( s _ m)! pMq "" ,+ ■■■ 

+ spq- 1 + q>, (1) 

To find the "most probable" number of happenings, we seek 
the value of m to which a maximum term of (1) corresponds. 
It is easly shown that 

m = ps 

gives the maximum term if ps is an integer. When ps is not an 
integer, the integer m is such that 

ps — g =? »j jg: ps + p 

gives a most probable value. 

When ps — q and ps + p are integers, there occur two equal 
terms in (1) each of which is larger than any other term of (1). 
For example, note the equality of the first and second terms of 
the expansion of 

(i + iV- 

For the present, let us assume that ps is an integer, and let us 
represent the terms of (1) by ordinates of the curve 

Vx = f{x), 

where x marks deviations from the maximum term as an origin. 
Then we have 



and 



y«=7 TT7 i s-.P p, **q'»- x , ... (2) 

" (ps — x)\(.qs + x)\ 



y-„ — , ttt — i — r. p p '- x q"" x . (3) 

{ps — a;) ! (qs + x) ! 
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ns ! nx ! 



By making x = in (2) or in (3), we have the maximum 
ordinate 

s! 
ps\ qsl 

With values of s that are reasonably large for statistical pur- 
poses, it is usually impractical to calculate the factorial in (2) 
and (3) without some special methods of approximation. Such 
a method is provided by an application of Stirling 's theorem for 
the representation of large factorials. 
This theorem states that* 

s ! = s* +, ' 5 e-* V2ir approximately. (4) 

The substitution of this value for s! and corresponding values 
for ps! and qs! in 

s! 
" pslqsl 

gives, after some simplification, 

1 



Vo = 



y2irspq 

To illustrate, the most probable value in throwing 1000 coins, 
namely 500 heads and 500 tails, has a probability 

y =£= = .02523. 

V500*- 

It is important to note that this most probable value is not likely 
to be obtained in a single trial since its probability is only a 
little more than -fa. It may be of interest to the reader to com- 
pare the simplicity of the calcuation of y as above with the cal- 
culation of 

1000! / 1 V 000 ^ , 
500T500! \J) ^ logarithms. 

By the application of Stirling's theorem to (2), we obtain, 
after slight simplification, 

y/2*spq V V*) \ <?V 



* For proof, see Whittaker and Watson, Modern Analysis, Third Edi- 
tion, p. 253; Czuber, Wahrschenlichkeitsrechnungr, I, 1908, p. 22. 
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In practical statistical problems, we are in general interested 
in deviations, x, that are small compared to the most probable 
value ps. In fact, we may well confine our attention to devia- 
tions that are of the order of Ys when we are discussing fluctua- 
tions in sampling. With this limitation on x, it is shown in the 
appendix to this paper that the sum 



, 2 2pqs 

y x -\- y-x = — — e approximately. (6) 

V znpqs 

Since .we very commonly make our inquiries about a given 
deviation on either side of the most probable, we are interested 
in the sum y* + y^ given by (6). 

The limitation that ps be an integer may well be removed. 
From what was shown above about the most probable value, we 
may in any case write for the most probable number of hap- 
penings 

i/i = ps -j- k 
where — 1 < k <. .+ 1. 

With larger values of s, it can be shown that the difference 
brought about by the use of ps -f k instead of ps is of negligible 
importance in our problem of fluctuations in sampling. 

We are particularly interested in finding the probability that 
the variable deviation x will remain within assigned bounds, say 
within d and — d inclusive. To find this probability requires a 
method of finding the sum 

x = d 

y* + !h-,+ ■ ■ ■ +yy + y a + y.,+ ■ ■ ■ + y-a — ^> y x . 

x —d 
(7) 
As there is likely to be a large number of y 's, some special 
method of finding the sum is of practical value. The sum of 
such a large set of ordinates may be found by the Euler-Mac- 
laurin formula of the calculus of finite differences. For the 
purpose of a sampling problem, a simpler method of approxima- 
tion than that provided by the Euler-Maclaurin Theorem is suit- 
able. It will be convenient in what follows to make 
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y* + y-x _ 1 a 2pqs ,_. 

»" = o~ — ~ 7o" ^ ' 

^ V2irpgs 

We may now well conceive of obtaining the approximate sum 
of the ordinates in (8) by finding the area enclosed'by the curve, 
the x-axis, and the ordinates x = — d — y 2 , and x = d -j- y%. 
For this purpose, we use as an approximate value of the area, 
the integral, 



j y' x dx = 2 J j/'rda 



V2»rpgs 



/ 



X 3 



2pgs , 

rfx (9) 



The theorem of Bernoulli may now be stated by saying, (1) that 
psis a most probable value, (2) that formula (9) gives the prob- 
ability that a deviation x from the most probable will not exceed 
an assigned deviation d on either side of the most probable. 

The numerical value of (9) is readily obtained in any parti- 
cular case by the use of the normal probability integral as we 
shall show by applications to the numerical questions proposed 
on pp. 1 and 2. 

In the first question. 

s = 1384. 
pqs = 346, 
Vp«s== 18.601, 

d «= 716 — 692 = 24, 

d + Vo 
.-^£1- = 1.3171. 

Vpqs 

From a table of the normal probability integral (Table IV, 

Davenport, Statistical Methods), we find for the deviation 

x d + y 2 

-= : — - — 1.3171, the value of (9) to be P — .8122. 

" Vpqs 
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This is the probability in throwing 1384 coins that the deviation 
of the number of heads from J- 8 /* = 692 will not exceed 24. 
The probability of a deviation greater than 24 is then 1 — .8122 
«= .1878. Expressed in another way, we may say we should 
predict that, in the long run, a deviation greater than 24 on 
either side of the most probable will occur slightly less than 
once per five trials. In the second question, 

s = 3232, 

Vpgs = 28.425, 

d = 1705 — 1616 — 89, 

^XJl = 3.1486. 

Vpqs 

Referring now to a table of the normal probability integral, we 
find for the deviation 

-= u T_Jl =3.1486, 
v Vspq 

the value of (9) to be 

P = .99836. 

The probability of a deviation greater than the assigned devi- 
ation on either side of the most probable M^ = 1616, is then 
1 — .99836 — .00164. 

In other words, we predict a deviation larger than 89 on either 
side of the most probable should occur in the long run about once 
per .^jj^-f trials, or roughly once per 600 trials. We thus have 
a quantitative criterion to judge of the significance of the given 
deviation compared to fluctuations in simple sampling. 

By dividing the frequencies under consideration by the total 
numbers involved, we have relative frequencies, and the theorem 
of Bernoulli may be stated in terms of these relative frequencies. 
We should then regard the theorem as furnishing a criterion 
for testing whether the deviation of a statistical ratio from an 
assumed probability can be reasonably regarded as a fluctuation 
in sampling. 

To summarize, we may say that the theorem of Bernoulli states 
(1) the number of happenings that is most probable, (2) the 
probability, in making s trials with constant probability p, that 
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the departure of the number of successes from sp will not exceed 
a given number, or that the departure of a statistical ratio from 
a known constant probability p will not exceed a given value. 

The converse theorem to that of Bernoulli is of great import- 
ance in statistics. That is, to determine the probability that an 
unknown probability of an event will not deviate more than an 
assigned value from a statistical ratio is a problem of much 
interest. We shall not attempt here to give the reasoning by 
which the converse is established because of limitations of space 
and because the purpose of this paper i$ accomplished by giving 
a view of the method of treating one of the simplest problems of 
fluctuations in sampling. 

Appendix 
Given the function 

2/* T==l 1 + " I 1 1 I (1 ) 

\/2TrspQ\ V s ) \ 9V 

marked (5) above, to show that 

x 2 



«/* + #-* = — == e approximately (2') 

yf2irpqs 

under the limitations on x stated on p. 5. 

From (1') 

log y„ — — i/ 2 log (2npqs) — (ps + x + y 2 ) log (1 -f- £) 
-~(qs-x+y 2 ) log (1-1,). 

By expanding log (1 + %) an d log (1 — f, ) in series, and 
simplifying, we have 

(p — q)x x 2 



log y x V2 log. (2irpgs) + 



2 



pqs 2pqs 



+ 6pV 69V ^ K ' 
Hence. 

(p — q)x . x^ x 3 X s ■ 

1 2p<7* — 2p<7* "*" 6pV ~~ 6gV + ' ' ' 



y. =- 



\/2vspq 
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x^_ 

1 2pqs ( , ■ (P — Q)x , _^_ _j£_ . \ 

= V2^ ' V 2pg S + 6«V 6pV + " ' ) 

(4') 
Similarly, x 2 

y "' V2^? I 2 P« S ~6pV + 6gV-j 

(5') 

In the applications to statistics, we are generally interested in 
deviations, x, that are small compared to ps and qs. In fact, we 
may well confine our attention in the treatment of fluctuations 

hv sampling to values of x not exceeding V s hi order of mag- 
nitude. Thus, in (4') we have not retained in the parenthesis 

(p q ) 2 X 2 

the term -r— r~- . Under this limitation on x, the sum of 

t 2(2pqs) 2 

terms beyond those written in (4') and (5') may be taken as 
negligible for our purposes. 
Hence, we have from (4') and (5'), 



2 2 P9 S 

y* + y.x — — = e approximately. 

V2n-pgs 



